P018512US 



IN THE UNITED STATRS PATENT AND TRADEMARK OFFICE 
APPLICATION PAPERS 
OF 

PAUL ANTHONY GILKERSON 
FOR 

BRANCH PREDICTION IN A DATA PROCESSD^G APPARATUS 



P018512US 



1 



BACKGROUND OF THE INVENTION 

Field of the Invention 

The present invention relates to a data processing apparatus and method for 
predicting execution of instructions. 
Description of the Prior Art 

A data processing apparatus will typically include a processor for executing 
instructions. Further, a prefetch unit will typically be provided for prefetching 
instructions from memory that are required by the processor, with the aim of ensuring 
that the processor has a steady stream of instructions to execute, thereby aiming to 
maximise the performance of the processor. 

Sequences of instructions to be executed by the processor are often not stored in 
memory one after the other, since software execution often involves changes in 
instruction flow that cause the processor to move between different sections of code 
dependent on the tasks being executed. An example of a change in instruction flow that 
can occur when executing software is a •^branch", which results in the instruction flow 
jumping to a particular section of code as specified by the branch. A branch instruction 
will typically be provided which when executed will cause that branch to take place. 
However, often such branch instructions are conditional instructions, and as such a 
decision will be made by the processor as to whether to execute such a conditional 
instruction dependent on predetermined conditions existing at the time that that 
instruction is to be executed. 

Conditional instructions can cause problems for the prefetch unit, since the 
instruction that should be prefetched by the prefetch unit will depend on whether the 
conditional instruction is going to be executed or not by the processor. To assist the 
prefetch unit in its task of retrieving instructions for the processor, prediction logic is 
often provided for predicting whether conditional instructions will be executed by the 
processor. Considering the example of a conditional branch instruction, the prediction 
logic may be arranged to predict whether the branch specified by that branch instruction 
will be taken. If the prediction logic predicts that the branch will be taken (i.e. the branch 
instruction will be executed), then it instructs the prefetch unit to retrieve the instruction 
that is specified by the branch, and clearly if the branch prediction is accurate, this will 
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serve to increase the performance of the processor, since at the time of execution it will 
not need to stall its execution flow while that instruction is retrieved from memory. 
Typically, a record will be kept of the address of the instruction that would be required if 
the prediction made by the prediction logic was wrong, such that if the processor 
subsequently determines that the prediction was wrong, the prefetch unit can then 
retrieve the required instruction. 

A branch instruction is an example of an instruction flow changing instruction. 
Whilst prediction logic is useful in predicting whether a conditional instruction flow 
changing instruction will be executed, this is only useful if the prefetch imit is able to 
determine the address (also referred to herein as the target address) in memory from 
which the next instmctioii should be retrieved if the prediction logic predicts that that 
instruction flow changing instraction will be executed. Hence, whilst it is known to 
provide prefetch units with prediction logic which can provide predictions for conditional 
direct branch instructions (i.e. branch instructions where the target address is specified^ 
directly within the branch instmction with reference to a program counter (PC) value), it 
is less common for a prefetch unit to provide predictions for conditional indirect branch 
instructions (i.e. branch instructions where the target address is not directly specified 
within the branch instruction, and instead, for example, may be specified with reference 
to the contents of one or more registers), since the prefetch unit often will not have access 
to the information required to detemiine the target address. Instead, such conditional 
indirect branch instmctions are often ignored by the prefetch unit, and instead the 
prefetch unit merely continues to fetch instmctions from an incremented PC value. If the 
processor subsequently executes such an indirect branch instruction, then it will typically 
issue a signal back to the prefetch unit to cause the prefetch unit to prefetch the next 
instruction required by the processor from an address specified by the processor, the 
processor having access to the registers, and hence being able to determine the required 
address for that next instruction. 

One example where an imconditional indirect branch instruction has had a 
prediction of its target address made by the prefetch unit is in the ARM 10 processor 
developed by ARM Lunited, where an unconditional procedure return instmction had its 
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target address predicted with reference to a single address register maintained by the 
prefetch unit identifying a target address. 

In addition to normal conditional branch instructions, there are other types of 
conditional instruction flow changing instructions which are not typically considered as 
branches. As an example, certain branch instructions may cause the flow of execution to 
branch to a particular procedure or sub-routine, and once that procedure has been 
executed, instruction flow then retums to the next instruction following the original 
branch instruction. To cause the return to take place, a procedure retum instruction can 
be executed in order to change the value of the program counter to a retum address value, 
so as to cause execution to retum to the ^propriate point, i.e. the instruction immediately 
following the original branch instmction. The required retum address value for the 
instraction to be returned to will typically have been stored within a register, or stored out 
to memory with a pointer to that retum address value in memory then being stored within 
a register. There are a variety of formats of such procedure retum instmctions, for 
example move instmctions, load instmctions, add instmctions, etc, but they all typically 
involve updating the PC value with a retum address value obtained with reference to a 
register and/or memory. 

Such procedure return instmctions are a type of instmction flow changing 
instmction, since they cause the PC value to be changed, but they are not typically 
considered to be branch instmctions and hence they are not typically reviewed by the 
prefetch unit, or predicted by the prediction logic, particularly when in conditional form. 
Furthermore, it can be seen that since they typically cause the PC value to be updated 
with reference to the register and/or memory, they are an indirect instmction flow 
changing instraction, and as such the prefetch unit would typically be unable to calculate 
the retum address. Hence, such conditional instmction flow changing instmctions have 
not been considered by the prefetch unit and instead the prefetch unit has merely 
continued to fetch the next instmction from an incremented PC value. 

When branch instmctions that specify a branch to a particular procedure or sub- 
routine are executed, it is known to provide a retum stack in which a retum address can 
be specified. Typically, such a retum stack is maintained by the processor. When the 
processor is then to execute an instmction flow changing mstmction of the type where 
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the PC value is updated with reference to the contents of a register and/or memory, then 
the return stack can be used to predict the new PC value prior to it actually being 
determined by the processor so as to allow an early indication of the address to be 
provided to the prefetch unit. It will be appreciated that this may provide an accurate 
address in situations where the instruction flow changing instruction is in fact the 
procedure retum instruction associated with that earlier branch instruction. This 
approach is used for such instruction flow changing instructions when they are 
unconditional, since it is then certain that the processor will be executing the instruction, 
and accordingly it is worth making the prediction with regards to the contents of the 
retum stack. 

It is an object of the present invention to provide an improved technique for 
predicting execution of conditional instruction flow changing instructions. 

SUMMARY OF THE INVENTION 

Viewed from a first aspect, the present invention provides a data processing 
apparatus, comprising: a processor operable to execute instructions; a prefetch unit 
operable to prefetch instructions from a memory prior to sending those instmctions to the 
processor for execution, the prefetch unit being operable to determine for a prefetched 
instruction whether that prefetched instruction is an instmction flow changing 
instruction, and based thereon to determine a fetch address for a next instruction to be 
prefetched by the prefetch unit; a retum stack accessible by the prefetch unit and operable 
to hold one or more addresses; and prediction logic operable, if the prefetched instraction 
is a conditional instruction, to predict whether that prefetched instruction will be 
executed by the processor, the prefetch logic being operable to determine the fetch 
address dependent on the prediction from the prediction logic; in the event that the 
prefetched instruction is a first type of instmction flow changing instraction and is 
conditional, and the prediction logic predicts that that prefetched instraction will be 
executed,* the prefetch logic being operable to determine as the fetch address an address 
obtained from the retum stack. 

In accordance with the present invention, a data processing qjparatus has a 
processor operable to execute instractions, and a prefetch unit for prefetching instmctions 
from memory prior to sending those instractions to the processor for execution. Further, 
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prediction logic is provided to predict whether a conditional prefetched instruction will 
be executed, with the prefetch logic determining a fetch address for a next instruction 
dependent on that prediction. Further, a return stack is provided which is accessible by 
the prefetch unit and operable to hold one or more addresses. Then, in the event that the 
prefetched instruction is a first type of instruction flow changing instruction and is 
conditional, and the branch prediction logic predicts that that prefetched instruction will 
be executed, the prefetch logic is operable to detennine as the fetch address an address 
obtained from the return stack. 

By this approach, the prediction logic can provide a prediction about the 
execution of a first type of instruction flow changing instruction that is conditional. 
Furthermore, even if the first type of instruction flow changing instruction is an indirect 
instruction flow changing instraction (i.e. the new PC value following execution of the 
instruction flow changing instruction is not derivable directly &om the instruction flow 
changing instruction itself), the prefetch unit is able to establish a predicted address for a 
next instruction to be prefetched with reference to the contents of the return stack. 

It will be appreciated that in the present context the term "next" instruction is a 
reference to the next instruction to be prefetched by the prefetch unit following analysis 
of the prefetched instruction under consideration, and does not imply that in the interim 
period no other instructions will have been prefetched. Indeed, it will typically be the 
case that whilst the prefetch unit is analysing a particular prefetched instruction, one or 
more other instructions may be in the process of being prefetched &om the memory, and 
accordingly the next instruction. as referred to above refers to the next instruction to be 
prefetched as a result of the analysis of a curroit prefetched instruction. 

In the context of the present ^plication, the term "executed" as used in 
connection with a conditional instruction refers to the actual execution of the instruction 
following a determination having been made by the processor that the conditions 
associated with that instruction have been met. It will be ^preciated that a conditional 
instruction will typicaUy have to be at least partially decoded by the processor in order to 
enable the conditional infonnation contained in that instruction to be derived and then 
analysed. However, for the puiposes of the present appUcation, this process is not 
considered to be execution of the conditional instruction, and execution of the 
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conditional instruction will only be considered to occur if the conditions specified by the 
instruction have been met. If the conditions specified by the instruction are not met, the 
conditional instruction is considered as not being executed. 

It will be appreciated that the first type of instruction flow changing instruction 
may take a variety of forms. However, in one embodiment, the first type of instmction 
flow changing instmction is a procedure return instmction operable when executed to 
cause the processor to retum firom a procedure being executed by the processor. Hence, 
assuming any conditions associated with this procedure retum instmction are met, and 
hence the procedure retum instmction is executed, this will cause the processor to retum 
firom the procedxire being executed by the processor. However, if any conditions 
associated with the procedure retum instruction are not met, then no retum ftom the 
procedure will take place, and instead the next sequential instmction firom memory will 
be executed. In many existing systems, such procedure retum instmctions are not 
conditional. However, in accordance with embodiments of the present invention, such 
procedure retum instmctions can be conditional, and the prediction logic can be arranged 
to predict whether such conditional procedure retum instmctions will be executed. In the 
event that the prediction logic predicts that the procedure retum instmction will be 
executed, the prefetch unit can prefetch an instmction which is predicted to be the 
required instmction following the retum firom the procedure, the address for this 
instmction being identified with reference to the content of the retum stack. 

It will be appreciated that there are a number of different ways in which addresses 
may be added to the retum stack. However, in one embodiment, if the prefetch logic 
determines that the prefetched instmction is a second type of instmction flow changing 
instmction, the prefetch logic is fiirther operable to determine a retum address and to 
cause that return address to be placed on the retum stack. 

The prefetch logic may determine the retum address in a variety of ways. 
However, typically, the retum address will be calculated by incrementing the address of 
the prefetched instmction currently being analysed by the prefetch logic. 

In one particular embodiment, the second type of instmction flow changing 
instmction is a branch with link instmction, which is operable to identify a start address 
for a procedure to be executed by the processor, upon returning from the procedure the 
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next instruction to be executed by the processor being specified by the return address. In 
such embodiments, the procedure is typically returned Scorn by execution of one of the 
first type of instruction flow changing instructions. Hence, in accordance with such 
embodiments of the present invention, prediction logic can be used to predict an outcome 
of any first type of instruction flow changing instruction that is conditional, and the 
content of the return stack can be used to predict the return address in the event that the 
prediction logic predicts that that first type of instruction flow changing instruction will 
be executed. This enables the prefetch unit to retrieve as a next instruction the 
instruction which is predicted to be required by the processor upon execution of that first 
type of instruction flow changing instruction. 

The prediction logic may take a variety of forms. In one embodiment, the 
prediction logic may be static prediction logic which is arranged to make a prediction 
about the likely outcome of an instiiiction flow changing instruction only using 
infonnation m the instruction itself In practice, this usually means using characteristics 
like the direction of the change in instruction flow to make a prediction. As an example, 
backwards changes in instruction flow (i.e. changes in instiiiction flow that cause a retiim 
to an instruction with a lower address) are typically found at the end of loops and are 
therefore generally considered to be taken more times than not taken, whereas forwards 
changes in instruction flow (i.e. changes that move to an instinction witii a higher 
address) have a more likely probability of not being taken. Therefore, it is common that 
static prediction logic is arranged to predict backwards changes in instinction flow as 
taken, and forwards changes in instruction flow as not taken. 

However, in one embodiment, tiie branch prediction logic is a dynamic prediction 
logic which is operable to provide a prediction as to whether tiie prefetched instruction 
will be executed by the processor dependent upon history infomiation identifying an 
outcome of conditional instioictions previously executed by the processor. It has been 
found tiiat dynamic branch prediction can provide accurate prediction for instinction flow 
changing instinictions that include instiuctions of the first type of instiuction flow 
changing instinction, for example procedure retiim instonctions. 

It will be {q>preciated that prediction logic can be located at any suitable location 
witiiin the data processing apparaftis. However, in one embodiment, flie prediction logic 
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is provided within the prefetch unit. Similarly, the return stack can be located at any 
appropriate position, provided that it is accessible by the prefetch unit. However, in one 
embodiment, the return stack is provided within the prefetch unit. 

The prefetch imit can be arranged in a variety of ways. However, in one 
embodiment, the prefetch unit comprises decode logic operable to determine for the 
prefetched instruction whether that prefetched instruction is an instruction flow changing 
instmction, and control logic operable in response to the decode logic to determine the 
fetch address for the next instruction to be prefetched by the prefetch unit. 

Viewed from a second aspect, the present invention provides a method of 
operatmg a data processing apparatus comprising a processor operable to execute 
instructions, a prefetch unit operable to prefetch instmctions from a memory prior to 
sending those instructions to the processor for execution, and a return stack accessible by 
the prefetch unit and operable to hold one or more addresses, the method comprising the 
steps of: (a) determining for a prefetched instruction whether that prefetched instruction 
is an instruction flow changing instruction, and based thereon determining a fetch 
address for a next instruction to be prefetched by the prefetch unit; (b) if the prefetched 
instruction is a conditional instruction, predicting whether that prefetched instmction will 
be executed by the processor, and at said step (a) determining the fetch address 
dependent on the prediction; and (c) in the event that the prefetched instruction is a first 
type of instruction flow changing mstruction and is conditional, and if said step (b) 
predicts that that prefetched instruction will be executed, determining as the fetch address 
an address obtained 60m the return stack. 

BRIEF DESCRIPTION OF THE DRAWINGS 
The present invention will be described fiuther, by way of example only, with 
reference to preferred embodiments thereof as illustrated in the accompanying drawings, 
in which: 

Figure 1 is a block diagram illustrating a data processing apparatus connected to a 
memoiy in accordance with one embodiment of the present invention; 

Figure 2 is a block diagram illustrating the operation of the dynamic branch 
predictor of figure 1 ; 
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Figure 3 schematically illustrates the operation of a return stack in accordance 
with one embodiment of the present invention; 

Figures 4A and 4B are flow diagrams illustrating processing performed within 
the prefetch unit in accordance with one «nbodiment of the present invention; 

Figure 5A is a flow diagram illustrating certain processing performed within the 
processor core when receiving a procedure return instruction that the prefetch unit has 
predicted as not being executed; and 

Figure 5B is a flow diagram illustrating certain processing performed within the 
processor core when receiving a procedure retum instruction that the prefetch unit has 
predicted as being ^ecuted. 

DESCRIPTION OF PREFERRED EMBODIMENTS 
Figure 1 is a block diagram of a data processing apparatus coimected to a 
memory 10 in accordance with one embodiment of the present invention. The data 
processing apparatus includes a pipelined core 30 having a pipelined execution unit for 
executing instructions. The prefetch unit 20 is arranged to prefetch instructions from 
memory 10 that are required by the pipelined processor core 30, with the aim of 
ensuring that the processor core 30 has a steady stream of instructions to execute, 
thereby aiming to maximise the performance of the processor core 30. 

Prefetch control logic 55 is arranged to issue control signals to the memory 10 to 
control die fetching of instructions from the memory, whilst in addition providing a 
control signal to the multiplexer 46 to cause a fetch address to be output to the memory 
10 identifying the address of the instruction to be prefetched. The prefetched instruction 
is then returned from the memory 10 to the instruction buffer 70, from where it is then 
output from the prefetch unit to the core 30. At any particular point in time, there will 
typically be a plurahty of instructions within the instruction buffer 70 waiting to be sent 
to the core 30 as and when required by the core. The instruction buffer 70 typically is 
arranged to act as a First-In-First-Out (FIFO) buffer. 

Decode logic 65 is provided within the prefetch unit 20, and is operable for each 
prefetched instraction in the instruction buffer to partially decode that instruction in oider 
to detemiine whether that prefetched instruction is an instruction flow changing 
instruction. This wiU typically be done by analysing certain bits of the opcode of the 
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instruction in order to determine the instruction type. The decode logic 65 may be 
arranged to detect a number of different types of instruction flow changing instruction. 
In one particular embodiment, the decode logic 65 is arranged to partiaUy decode each 
prefetched instruction in order to determine whether it is a branch instruction, a branch 
5 with link instruction, or a procedure return instraction. There are a number of different 
procedure return instructions that may be used to cause the processor to return from a 
procedure that is currently executing, and accordingly in some embodiments of the 
present invention the decode logic 65 will analyse the prefetched instruction to detemiine 
fiom various bits of the opcode whether that instruction is one of a number of different 
10 versions of a procedure return instruction. In one particular embodiment, the decoded 
logic may look for the following types of procedure return instruction: 

1) MOVPC,R14 

2) BX R14 

3) LDMR13, {PC, ...} 
15 4) LDRPC,[R13] 

5) POP 

The first four above instructions are fi-om the ARM instruction set developed by 
ARM Limited. The move instruction "MOV" is arranged to update the program counter 
(PC) value with the contents of the register R14. Hence, when this instruction is used, 
the register R14 will previously be loaded with the return address required when 
returning from the procedure being executed by the processor. The branch exchange 
instruction "BX" branches to the address stored in the register R14, and may also cause a 
change in instruction set to take place. The two types of load instruction "LDM" and 
"LDR" are both arranged to obtain &om memory the data identified by an address stored 
25 in the register R13 and to then use that data as the new PC value. Hence, when these 
versions of procedure return instruction are used, the return address will previously have 
been stored to memory, with its location in memory being stored within the register R13. 
Finally, the POP instruction is an instruction specified by the Thumb instruction set 
developed by ARM Limited, the Thumb instruction set comprising 16-bit instmctions. 
The POP instruction performs a similar fimction to the specific move instraction "MOV 
discussed earlier. 
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When the decode logic has analysed the prefetched instruction, it ou^uts a 
control signal to the prefetch control logic 55 indicative of the results of that analysis. 
This information is used by the prefetch control logic when determining which of the 
various inputs to the multiplexer 46 should be output as the fetch address for the next 
5 instruction to be prefetched by the prefetch unit 20. The prefetch control logic 55 is also 
arranged to receive a signal from the dynamic branch predictor 60 which, for conditional 
instructions, is arranged to provide an indication as to whether the instruction is predicted 
as being taken (i.e. executed) or not taken. As will be discussed in more detail with 
reference to figure 2, the dynamic branch predictor 60 is provided with a confirm signal 
0 fi-om the core 30, giving history information about the outcome of previously executed 
conditional instructions, this history information being used to update infoimation 
maintained by the dynamic branch predictor and used to make fiituie predictions. 

The dynamic branch predictor 60 can be arranged to output a signal to the 
prefetch control logic every cycle, but in that instance the prefetch control logic 55 is 
• only arranged to consider the signal that it receives &om the dynamic branch predictor 60 
in instances where the decode logic 65 has identified that the prefetched instruction being 
analysed by the decode logic is a conditional instruction flow changing instruction. 

With regards to the inputs to the multiplexer 46, a current PC value is stored in 
register 40, which receives its input from the output of the multiplexer 46. The output 
from the register 40 is then fed to an incrementer 42 which then provides an incremented 
version of the PC value as an input to the muliplexer 46. Address generation logic 44 is 
arranged to receive an immediate value fiom decode logic 65 in situations where the 
decode logic 65 identifies either a branch instruction or a branch with link instruction 
containing an immediate value. Both the branch instruction and the branch with link 
instmction may specify within the instraction an immediate value, also referred to herein 
as an offset value, which when added to the address of that instruction (in some cases 
along with a predetermined constant value) will give an indication of the target address 
for the next instruction required by the processor should that branch or branch with link 
instruction be executed by the processor. The address generation logic 44 has access to 
the address of each instruction analysed by the decode logic, and if provided with an 
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immediate value over path 67 perfomis this addition and then provides the generated 
target address as an input to the multiplexer 46. 

A return stack 50 is also provided within the prefetch logic, and is arranged as a 
push/pop stack which is controlled by the prefetch control logic 55. Each time the 
decode logic identifies a branch with link instruction, the prefetch control logic is 
arranged to push a return address onto the return stack 50, the return address typically 
being determined by incrementing the address of the branch with link instruction. When 
a procedure retum instruction is identified by the decode logic 65, the prefetch control 
logic 55 is then arranged to cause an address fi-om the retum stack to be popped fi-om the 
retum stack and output as an input to the multiplexer 46, in this instance the prefetch 
control logic 55 also sending a control signal to the multiplexer 46 to cause the address 
popped fi-om the retum stack to be output as the fetch address to memory 10. Through 
this mechanism, the prefetch unit is able, in situations where the decode logic has 
identified a procedure retum instruction, and the dynamic branch predictor 60 has 
predicted that that procedure retum instruction will be executed (or the procedure return 
instruction is unconditional), to predict a retum address with reference to the retum stack, 
and accordingly retrieve the instruction at that predicted retum address from memory. In 
situations where the prediction is correct, this will provide a significant performance 
improvement of the pipelined core 30. However, the pipelined core 30 will ultimately 
need to determine that the prediction made by the dynamic branch predictor 60 is correct, 
and in the event that it is correct will also need to check that the predicted retum address 
determined with reference to the retum stack 50 is also correct. 

In situations where the pipelined core subsequently detenmines that any 
prediction made by the prefetch unit, whether that relates to a branch instruction, a 
branch with link instmction, or a procedure retum instmction, is inaccurate, then it will 
typically calculate the address for the instmction that is required next by the core, and 
will output that address as a forced address back to the prefetch unit 20, this forced 
address being input to the multiplexer 46. In that instance, the prefetch control logic 55 
will cause the multiplexer 46 to output as the fetch address the forced address retumed 
from the pipelined core 30. Further, the current contents of the instmction buffer 70 will 
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typically be flushed, such that when the required instruction is returned from memory, 
that instruction can be output to the pipelined core 30. 

The dynamic branch predictor logic 60 of figure 1 is arranged to operate in a 
known manner, as will be briefly be discussed with reference to figure 2. However, in 
5 addition to receiving history information about the outcome of branch and branch with 
link instructions, the dynamic branch predictor logic 60 will also receive history 
information concerning the outcome of any procedure return instructions that are 
conditional. The dynamic branch predictor 60 includes a Pattern History Table (PHT), 
which identifies for a particular index whether a prediction of "taken" or "not taken" 
0 should be output. In a known manner, the confirmed history 110 for a number of 
preceding conditional instructions, for example eight preceding instructions, is 
maintained by the dynamic branch predictor 60 using outcome information returned &om 
the core 30, and then sent over path 112 to a write port of the PHT 100 to cause the 
contents of the PHT to be updated 
5 Also as known, a prediction history 105 is maintained by the dynamic branch 

predictor 60, which is updated each time the decode logic 65 determines that a 
conditional instruction flow changing instruction has been detected. In that instance, an 
appropriate control signal (e.g. a logic one value) is received by the prediction history 
logic 105 over path 120, in order to cause a current prediction over path 102 to be loaded 
) into the prediction history 105. The prediction history will maintain details of the 
predictions made for a certain number of preceding instruction flow changing 
instmctions analysed by the decode logic 65, for example eight instruction flow changing 
instractions, and that prediction history will be used to generate an index over path 107 in 
order to access the PHT 100. As a result of accessing the PHT 100 with the index 
provided over path 107, a prediction will be output over path 102 from the dynamic 
branch predictor 60 to the prefetch control logic 55. It will be noted that, at any point in 
time, the conditional instructions on which the prediction history 105 is based will be 
different to the conditional instructions on which the confirmed history 1 10 is based, due 
to the time delay between analysis of a conditional instruction by the decode logic 65, 
and the passage of that instruction through the execution pipeline of the core 30. 
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Figure 3 schematically illustrates the return stack 50 of figure 1. The return stack 
50 shown in figure 3 has three entry locations 200, 210, 220 for receiving return 
addresses pushed onto the return stack under the control of the prefetch control logic 55. 
Return addresses are popped &om the return stack in accordance with a 'Tirst hi Last 
Ouf'poUcy, 

As a result, it can be seen that if a branch with link instruction is detected by the 
decode logic, thus causing a return address to be pushed onto the return stack 50, and that 
branch instruction is then subsequently executed by the processor, then if the first 
instruction subsequently executed by the processor that is detected as a procedure return 
instruction is in fact the procedure return instruction for the procedure identified by the 
branch with link instruction, then provided the prefetch unit has predicted that procedure 
return instruction as being executed, the information obtained fiom the return stack will 
have provided an accurate address for the instruction now required by the processor core. 

Figures 4A and 4B are flow diagrams illustrating processing performed within 
the prefetch unit 20 in accordance wifli one embodiment of the present invention. At 
step 300, the decode logic 65 partially decodes the instruction in order to determine the 
instmction type. Then, at step 305 it is determined whether the instruction is a procedure 
retum instruction. If so, the process proceeds to step 315, where it is determined whether 
that procedure return instruction is a conditional instruction. These steps will typically be 
performed by the decode logic 65, and will result in the issuance of appropriate control 
signals to the prefetch control logic 55. 

If at step 315 it is determined that the instruction is conditional, then at step 320 a 
look up wiU be performed in the branch predictor 60 in order to obtain a branch 
prediction which wiU be input to the prefetch control logic 55. Then, at step 325, the 
prefetch control logic 55 will determine whether the branch is predicted as taken. If not, 
then at step 330 the prefetch control logic 55 will control the multiplexer 46 to output as 
the fetch address the incremented PC value as received fi-om the increment logic 42. 
However, if the branch is predicted as taken at step 325, or the instruction is determined 
not to be conditional at step 315, the process proceeds to step 335, where the prefetch 
control logic 55 causes an address to be popped fiom the retum stack 50, and for the 
multiplexer 46 to then output that popped address as the fetch address. 



P018512US 



15 



10 



15 



30 



If at step 305, it is detennined that the currently analysed prefetched instruction is 
not a procedure return instruction, then the process branches to figure 4B, where at stq, 
400 it is detennined whether the instruction is a branch with link (BL) instruction. If the 
instmction is a BL instruction, then the process proceeds to step 410, where it is 
detennined whether that instruction is a conditional instruction. The steps 400, 410 are 
performed within the decode logic 65, and cause appropriate control signals to be issued 
to the prefetch control logic 55. 

If the instruction is detennined at step 410 to be conditional, then the process 
proceeds to step 415, where a look up is perfonned in the branch predictor 60 in order to 
obtain a branch prediction, which is then input to the prefetch control logic 55. At step 
420, the prefetch control logic 55 then determines whether the branch is predicted as 
taken, and if not, controls the multiplexer 46 to output as the fetch address the 
incremented PC value received from the incrementer 42. 

However, if at step 420 it is detennined that the branch is predicted as taken, or if 
at step 410 it is determined that the instruction is not conditional, the process proceeds to 
step 425 where an incremented version of the address of the BL instruction decoded at 
step 300 is pushed onto the return stack 50. Thereafter, the process proceeds to step 450. 
where an immediate value decoded from the BL instruction is added by address 
generation logic 44 to the address of the BL instruction to produce a target address which 
20 is then output via the multiplexer 46 as the fetch address. 

If at step 400 it is determined that the instmction is not a BL instmction, then it is 
determined at step 405 whether the instmction is a branch (B) instmction. If the 
instmction is a B instmction, then the process proceeds to step 435, where it is 
determined whether that instmction is conditional. Again, steps 405 and 435 are 
performed within the decode logic 65 and result in appropriate control signals being 
issued to the prefetch control logic 55. 

If the instruction is determined to be conditional then the process proceeds to step 
440, where a look up is performed with the branch predictor 60 in order to obtain a 
branch prediction, whereafter at step 445 the prefetch control logic 55 determines 
whether the branch is predicted as taken. If not, then the process proceeds to step 430, 
where the incremented PC value from the incrementer 42 is output as the fetch address. 
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However, if the branch is predicted as taken at step 445, or if at step 435 it is determined 
that the branch instruction is not conditional, then the process proceeds to step 450, 
where an immediate value is added to the address of the B instruction by address 
generation logic 44 in order to generate a fetch address to be output via the multiplexer 
46. 

If at step 405, it is detennined that the instruction is not a branch mstruction, then 
in the above described embodiment of the present invention, this means that the 
prefetched instruction is not an instruction flow changing instruction, and in that instance 
the process proceeds directly to step 430, where the incremented PC value fiom 
incrementer 42 is output as the fetch address. 

When a procedure return instruction is ou^ut from tiie instruction buffer 70 to 
the processor core 30, the prefetch unit will also provide to the core an indication as to 
whethCT that procedure return instruction was predicted as being taken or not taken. Jn 
addition, if the prefetch unit has predicted the instruction as being taken, the prefetch unit 
provides the fetch address that was popped from the return stack. Further, in this 
instance, the prefetch unit will also provide to the processor core 30 a recover-PC address 
comprising an incremented address calculated by incrementing the address of the 
procedure return instruction, as this address will be needed if it is later determmed that 
the instruction should in fact not be executed. As will be appreciated by those skilled in 
the art, a recover-PC value will be associated with each instruction as it moves through 
the processor core, and represents an address which can be used in the event that the 
prediction is determined to be incorrect, i.e. it provides an indication of the opposite 
outcome to that predicted by the prefetch unit. 

Figure 5A is a flow diagram illustrating certain processing performed within 
the processor core when receiving a procedure return instruction that the prefetch unit 
has predicted as not being executed. At step 500, the processor core will determine 
whether the procedure return instruction should in fact be executed, this being done by 
checking of the appropriate condition codes. If it is determined that the instruction 
should not be executed, i.e. the prediction was correct, then the process proceeds to 
step 510, where no further action is required. However, if at step 500 it is determined 
that the instruction should be executed, then the process proceeds to step 520, where 
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the instruction is executed, and the result is used as a forced address to issue to the 
prefetch unit. This will then cause the prefetch unit to prefetch the instruction that is 
next required by the processor core, and to then route that instruction back to the 
processor core. 

Figure 5B is a flow diagram illustrating certain processing perfonned by the 
processor core when receiving a procedure return instruction that the prefetch unit has 
predicted as being executed. At step 525, it is determined whether the instruction is to 
be executed, again this being done by checking the appropriate condition codes. If it is 
determined that the instruction should not be executed, i.e. the prediction was wrong, 
then at step 530 the recover-PC value is issued as the forced address to the prefetch 
unit. 

If at step 525, it is determined that the instmction is to be executed, then the 
instmction is executed at step 535. However, additionally, as the fetch address popped 
from the return stack is only a prediction of the target address, then at step 540 an 
additional check is perfonned to check whether the fetch address popped from the 
return stack is the same as the address provided as the instruction result. If it is, then 
the process proceeds to step 550, where no further action is required. However, if it is 
determined that the fetch address is different to the result of the instmction, then the 
process proceeds to step 545, where the address produced as a result of the instraction's 
execution is output as a forced address to the prefetch unit. This will cause the 
prefetch unit 20 to retrieve the required instmction from memory, from where it can be 
routed to the processor core. 

From the above discussion of an embodiment of the present invention, it will be 
^preciated that the prefetch unit 20 of embodiments of the present invention provides 
for prediction of the outcome of more types of instmction flow changing instruction than 
previously could be predicted. In particular, for procedure return instmctions, which are 
an example of an indirect instmction flow changing instruction, a prediction can be made 
as to whether a conditional procedure return instmction will be executed, and in the event 
that it is predicted that the instruction will be executed, the prefetch unit can also predict 
the return address through reference to the retum stack. Accordingly the prefetch unit 20 
can prefetch an instmction which is predicted to be the instruction that will be required 
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by the processor core 30 once the procedure return instruction has been executed by the 
processor core. This provides a significant improvement in performance of the processor 
core in situations where an appropriate accuracy in the prediction by the prefetch unit can 
be achieved to make any extra checking step required by the core (e.g. step 540 of figure 
5 5B) worthwhile. 

Although a particular embodiment of the invention has been described herein, 
it will be apparent that the invention is not limited thereto, and that many 
modifications and additions may be made within the scope of the invention. For 
example, various combinations of the features of the following dependent claims could 
10 be made with the features of the independent claims without departing fi-om the scope 
of the present invention. 



